A Project By Shutong Li , Mingqi Cao, Hardik Hedaoo
Gesture recognition combines Pygame and OpenCV to run on Raspberry Pi. This program is divided into two main parts: Pygame for user interface and media playback control, while OpenCV and MediaPipe are used for gesture recognition to remotely control the player. Ultimately, the music player is able to achieve the following four functions:
The existing mainstream music player interfaces on the market basically control music playback by touching the screen or manipulating mechanical buttons. The final project this time: the gesture music player can control the playback of the music player only by recognizing hand movements.
First, install OpenCV in RPi and use the MediaPipe Python library to identify and track the position of 21 key points of the hand. Figure 1 shows the composition of 21 key points corresponding to the palm.
Use OpenCV's cv2.VideoCapture to initialize the camera and set the camera's resolution. Then create a MediaPipe Hands object and set the minimum detection confidence and tracking confidence.
ULoop through each frame of image from the camera. Flip and convert the image to RGB format because MediaPipe requires the input image to be in RGB format.
Use MediaPipe Hands to process each frame of image and obtain the position of the hand key points. If hand keypoints are detected, MediaPipe returns a list of keypoints, each keypoint has its x and y coordinates in the image and then parses the keypoint list. Determine which fingers are straight by comparing each fingertip keypoint to the y-coordinate of its next joint. If the y-coordinate of the fingertip is smaller than the y-coordinate of the PIP joint, the finger can be considered to be straight. Gesture detection is shown in the figure below:
The values of x, and y are the coordinate axis values within the corresponding recognition range, and the arrows represent the direction of finger movement. By identifying the number of fingers and hand coordinate positions, corresponding functions are generated, such as switching songs, volume control and exiting these functions.
When an operation is triggered, by creating a specific tag file (such as left.txt, right.txt, etc.) in the file system, the music player's main loop will respond to the existence of this file and perform the corresponding action. After the gesture is recognized and processed, the relevant mark files will be deleted to prevent repeated execution of commands. At the same time, the camera continues to capture the next frame of image. The entire process is looped to achieve real-time gesture recognition and response.
The entire interface design is built around Pygame's rendering loop, which implements an interactive music playback interface by regularly drawing graphic elements and processing user input.
By displaying buttons and lyrics on the screen, users can interact with the player through the touch screen or four GPIO mechanical switches, such as selecting songs, controlling play/pause, adjusting volume, etc. At the same time, the music player supports gesture recognition control to achieve the same function.
The music playback interface is divided into level 1 and level 2. level 1 displays the song playlist, and level 2 jumps to the lyrics scrolling interface after the user selects to start playing. The player uses the rolling_lyrics() function to implement the lyrics scrolling function.
Lyrics are stored in a list, and this function will update the displayed lyrics line based on the current line and scrolling speed. We are not yet able to realize the real-time corresponding display of music and lyrics. We only let the lyrics move up to the edge of the screen at a certain speed and then disappear. Complete lyrics scroll
Initially, there was an issue where the cv2
module (OpenCV) was not being found when running gesture_control.py
through a shell script (start.sh
), despite working fine when run independently. This problem was indicative of an environment discrepancy, where the Python environment accessed by the shell script did not have the cv2
module installed. This is a common issue in Python development, where different environments (or different Python interpreters) have different sets of installed modules.
The project involved integrating two Python scripts: imusic.py
for music playback control and gesture_control.py
for gesture recognition. The integration required gesture_control.py
to trigger actions in imusic.py
based on recognized hand gestures. This was achieved by using file creation as a signaling mechanism between the two scripts. Specific gestures would create designated files, which imusic.py
would detect and respond to by controlling music playback.
There was a requirement for imusic.py
to be run with sudo privileges when started from gesture_control.py
. This presented a challenge, as running scripts with sudo can introduce security risks and complications, especially in a non-interactive environment. The solution involved using the subprocess
module to start imusic.py
with sudo, ensuring that the script had the necessary permissions to perform its operations.
Two specific issues were encountered with gesture recognition: rapid toggling between play and pause states, and a conflict between selecting the second music track and swipe gestures. To address these, the gesture recognition logic in gesture_control.py
was refined. A delay was introduced to prevent rapid toggling, and the gesture detection was adjusted to more clearly differentiate between swiping and selecting the second track. This involved fine-tuning the gesture recognition parameters and adding conditional checks to distinguish between different types of gestures.
A requirement was to allow for the script to be terminated using a GPIO button on the Raspberry Pi. This was implemented by setting up a GPIO listener in gesture_control.py
. When the designated button was pressed, the script would perform a cleanup, releasing resources and terminating gracefully. This feature added a physical interaction element to the project, allowing for a hardware-based control mechanism.
The project requirements are basically realized, which is to be able to interact with the music player using gestures. The only disadvantage is that the camera is very sensitive and has a high transmission delay. The specific phenomenon is that if two or more palms appear on the screen, it will make a wrong judgment. At the same time, the gestures should not be made too fast during the test, because the system delay is very high, and too fast movements often cannot be captured by picamera, making it impossible to recognize.
Overall, this was a great project. We use Raspberry Pi and OpenCV to create the project. Integrate traditional music player with gesture recognition function for control. Different operating modes can bring completely different experiences. Although the test results obviously need further improvement, such as whether the music player can search for songs online, or whether the scrolling of lyrics in the second-level directory can scroll according to the music beat, etc. Maybe in future plans, we need to further modify the code or replace the camera to improve the transmission speed. The combination of OpenCV and Rpi will definitely lead to some better projects, there is no doubt about it.
sl2865@cornell.edu
Designed the initial interface of the music player and test
hgh36@cornell.edu
Designed the gesture control part and test
mc2592@cornell.edu
Designed music control and the playing interface of the app and test
//imusic.py
from pygame.locals import *
import RPi.GPIO as GPIO
import pygame
import time
import os, sys
# set up the enviroments
os.putenv('SDL_VIDEODRIVER', 'fbcon') # Display on piTFT
os.putenv('SDL_FBDEV', '/dev/fb1')
os.putenv('SDL_MOUSEDRV', 'TSLIB') # Track mouse clicks on piTFT
os.putenv('SDL_MOUSEDEV', '/dev/input/touchscreen')
GPIO.setmode(GPIO.BCM)
GPIO.setwarnings(False)
GPIO.setup(17, GPIO.IN, pull_up_down=GPIO.PUD_UP)
def GPIO17_callback(channel):
print("Button 17 has been pressed")
sys.exit()
STOP = True
RUN = True
PLAY = False
# Initialize the Pygame
pygame.init()
#pygame.mouse.set_visible(False)
# Basic configuration for the game
size = (width, height) = (320, 240)
screen = pygame.display.set_mode(size)
FPS = 40
clock = pygame.time.Clock()
WHITE = 255,255,255
BLACK = 0,0,0
RED = 255,0,0
GREEN = 0,255,0
background_image = pygame.image.load("bg.jpeg")
background_image = pygame.transform.scale(background_image, (width, height))
# Load lyrics
lyrics = [
"I pop the pain away,",
"I slide the pain away",
"I pop the pain away",
"I slide the pain away",
"I picked you up when you",
"Fell and cut your knee",
"Told you not to cry",
"and held you Close to me",
"I hope I'm not too late",
"To set my demons straight",
"I know I made you wait",
"But how much can you take?",
"I hope you see the God in me",
"I hope you can see",
"And if it's up,",
"stay down for me",
"Yeah",
]
#music list
music_list = [
"Die Hard Drive",
"Miracle on the 34th Line of Code",
"The Grinch Who Hacked Chreistmas",
"This is line 1",
"This is line 2",
"This is line 3",
"This is line 4",
"This is line 5",
"And so on...",
]
# Button and Text info configuration
history_content_font = pygame.font.Font(None, 20)
history_title_font = quit_button_font = pygame.font.Font(None, 30)
center_button_font = pygame.font.Font(None, 20)
font = pygame.font.Font(None, 36)
quit_button = {"prev": (50,220), "next":(270, 220)}
stop_button = {'stop': (160, 220)}
def draw_buttons():
# draw the quit button
for text, position in quit_button.items():
text_surface = quit_button_font.render(text, True, WHITE)
rect = text_surface.get_rect(center=position)
screen.blit(text_surface, rect)
# draw the center button
for text, position in stop_button.items():
if STOP:
text = "play"
text_surface = center_button_font.render(text, True, BLACK)
rect = text_surface.get_rect(center=position)
screen.blit(text_surface, rect)
def draw_circle():
if STOP:
# draw green color when panic stopped
pygame.draw.circle(screen, GREEN, (160,220), 20)
else:
# draw red color when not panic stopped
pygame.draw.circle(screen, RED, (160,220), 20)
def render_lines(start_line, num_lines):
rendered_lines = []
for i in range(start_line, start_line + num_lines):
rendered_lines.append(font.render(lyrics[i], True, WHITE))
return rendered_lines
scroll_speed = 15 # pixels per second
current_line = 0
scroll_timer = time.time()
scroll_position = 0
frozen = False
frozen_lines = []
elapsed_time = time.time() - scroll_timer
start_index = 0
selected_index = None
# Set up scrolling variables
scroll_pos = 0
button_height = font.get_linesize() + 10
def rolling_lyrics():
global scroll_speed, current_line, scroll_timer, scroll_position, frozen, frozen_lines, lyrics, elapsed_time
lines_to_display = 4
if current_line + lines_to_display >= len(lyrics):
current_line = 0
frozen = True
frozen_lines = lyrics[-3:]
# Display three lines at a time based on the reverse scroll position
if not frozen:
# Calculate scrolling
for i, text_line in enumerate(render_lines(current_line, lines_to_display)):
y = height // 2 - font.get_linesize() // 2 + i * font.get_linesize() - scroll_position
screen.blit(text_line, (width // 2 - text_line.get_width() // 2, y))
elapsed_time = time.time() - scroll_timer
if elapsed_time > len(lyrics[current_line]) / scroll_speed:
current_line += 1
scroll_timer = time.time()
scroll_position = 0
# Update scroll position based on elapsed time
scroll_position = int(elapsed_time * scroll_speed)
# Display the frozen lines when scrolling is frozen
if frozen:
for i, frozen_line in enumerate(lyrics[-3:]):
y = height // 2 - font.get_linesize() // 2 + i * font.get_linesize()
text_line = font.render(frozen_line, True, WHITE)
screen.blit(text_line, (width // 2 - text_line.get_width() // 2, y))
pygame.display.flip()
clock.tick(30)
def play_music(music):
screen.blit(background_image, (0, 0))
draw_circle()
draw_buttons()
# cmd = "mplayer /home/pi/Music/" + str(music) + ".mp3"
# os.system(cmd)
rolling_lyrics()
pygame.display.flip()
clock.tick(30)
# Function to render buttons
def render_buttons(start_index):
rendered_buttons = []
for i, music_item in enumerate(music_list[start_index:start_index+5]):
button_y = i * button_height + scroll_pos
button_rect = pygame.Rect(50, button_y, width - 100, button_height)
rendered_buttons.append((music_item, button_rect))
return rendered_buttons
def init_interface():
global RUN, start_index, selected_index, scroll_pos, scroll_speed, button_height
# Set up the clock
clock = pygame.time.Clock()
font = pygame.font.Font(None, 20)
# Set up button variables
button_color = (100, 100, 255)
selected_button_color = (255, 100, 100)
# Display buttons
for music_item, button_rect in render_buttons(start_index):
button_color_to_use = selected_button_color if selected_index == start_index else button_color
pygame.draw.rect(screen, button_color_to_use, button_rect)
text = font.render(music_item, True, (0, 0, 0))
screen.blit(text, (button_rect.x + 10, button_rect.y + 5))
pygame.display.flip()
clock.tick(30) # Adjust the frame rate as needed
if __name__ == "__main__":
ts = time.time()
GPIO.add_event_detect(17, GPIO.FALLING, callback=GPIO17_callback, bouncetime=500)
volume = 0.5
while RUN and (time.time()- ts < 120):
for event in pygame.event.get():
if event.type == pygame.QUIT:
RUN = False
elif event.type == pygame.MOUSEBUTTONDOWN:
pass
#if event.button == 4: # Scroll Up
# start_index = max(0, start_index - 1)
#elif event.button == 5: # Scroll Down
# start_index = min(len(music_list) - 5, start_index + 1)
#if event.button == 1: # Left mouse button
# for i, (_, button_rect) in enumerate(render_buttons(start_index)):
# if button_rect.collidepoint(event.pos):
# PLAY = True
# music = i + 1 + start_index
# Add action when a button is clicked here
cmd = ""
if os.path.exists("left.txt") and PLAY: #prev
#os.system("echo q > /home/pi/Project/fifo")
#music -= 1
#cmd = "sudo SDL_VIDEODRIVER=fbcon SDL_FBDEV=/dev/fb1 mplayer -vo sdl -framedrop -input file=/home/pi/Project/fifo /home/pi/Project/" + str((music) % 3) + ".mp3"
music = (music - 1) % 3 + 1
pws = "/home/pi/Project/" + str(music) + ".mp3"
pygame.mixer.music.load(pws)
pygame.mixer.music.play()
frozen = not frozen
STOP = not STOP
#os.system(cmd)
os.system("rm left.txt")
elif os.path.exists("right.txt") and PLAY: #next
music = (music + 1) % 3 + 1
pws = "/home/pi/Project/" + str(music) + ".mp3"
pygame.mixer.music.load(pws)
pygame.mixer.music.play()
frozen = not frozen
STOP = not STOP
#os.system(cmd)
os.system("rm right.txt")
elif os.path.exists("down.txt") and PLAY: #vulume down
volume = max(0, volume - 0.1)
pygame.mixer.music.set_volume(volume)
os.system("rm down.txt")
elif os.path.exists("up.txt") and PLAY: #vulume up
volume = min(1, volume + 0.1)
pygame.mixer.music.set_volume(volume)
os.system("rm up.txt")
elif os.path.exists("pause.txt") and PLAY:
STOP = not STOP
frozen = not frozen
pygame.mixer.music.pause()
os.system("rm pause.txt")
elif os.path.exists("up.txt") and not PLAY:
start_index = max(0, start_index - 1)
os.system("rm up.txt")
elif os.path.exists("down.txt") and not PLAY:
start_index = min(len(music_list) - 5, start_index + 1)
os.system("rm down.txt")
elif os.path.exists("1.txt") and not PLAY: #next
pws = "/home/pi/Project/" + str((1 + start_index) % 3 + 1) + ".mp3"
pygame.mixer.music.load(pws)
pygame.mixer.music.play()
#os.system(cmd)
music = 1 + start_index
PLAY = True
os.system("rm 1.txt")
elif os.path.exists("2.txt") and not PLAY:
cmd = "/home/pi/Project/" + str((start_index + 2) % 3 + 1) + ".mp3"
#os.system(cmd)
pygame.mixer.music.load(cmd)
music = 2 + start_index
PLAY = True
os.system("rm 2.txt")
elif os.path.exists("3.txt") and not PLAY:
cmd = "/home/pi/Project/" + str((start_index + 3) % 3 + 1) + ".mp3"
#os.system(cmd)
pygame.mixer.music.load(cmd)
music = 3 + start_index
PLAY = True
os.system("rm 3.txt")
elif os.path.exists("4.txt") and not PLAY: #next
cmd = "/home/pi/Project/" + str((start_index + 4) % 3 + 1) + ".mp3"
music = 4 + start_index
pygame.mixer.music.load(cmd)
PLAY = True
os.system("rm 4.txt")
elif os.path.exists("5.txt") and not PLAY:
cmd = "/home/pi/Project/" + str((start_index + 5) % 3 + 1) + ".mp3"
os.system(cmd)
music = 5 + start_index
pygame.mixer.music.load(cmd)
PLAY = True
os.system("rm 5.txt")
screen.blit(background_image, (0, 0))
if not PLAY:
init_interface()
else:
play_music(music)
#os.system(cmd)
cmd = ""
pygame.quit()
//gesture_control.py
import cv2
import mediapipe as mp
import pyautogui
import os
#import subprocess
imusic_started = False
mp_drawing = mp.solutions.drawing_utils
mp_hands = mp.solutions.hands
# Constants for the webcam window size
wCam, hCam = 640, 480
# IDs of finger tips in hand landmarks
tipIds = [4, 8, 12, 16, 20]
state = None
def fingerPosition(image, handNo=0):
lmList = []
if results.multi_hand_landmarks:
myHand = results.multi_hand_landmarks[handNo]
for id, lm in enumerate(myHand.landmark):
h, w, c = image.shape
cx, cy = int(lm.x * w), int(lm.y * h)
lmList.append([id, cx, cy])
return lmList
# Setup the webcam capture
cap = cv2.VideoCapture(0)
cap.set(3, wCam)
cap.set(4, hCam)
with mp_hands.Hands(
min_detection_confidence=0.8,
min_tracking_confidence=0.5) as hands:
while cap.isOpened():
success, image = cap.read()
if not success:
print("Ignoring empty camera frame.")
continue
# Flip the image horizontally for a later selfie-view display, and convert the BGR image to RGB.
image = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB)
# Process the image and draw landmarks
results = hands.process(image)
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
if results.multi_hand_landmarks:
for hand_landmarks in results.multi_hand_landmarks:
mp_drawing.draw_landmarks(image, hand_landmarks, mp_hands.HAND_CONNECTIONS)
lmList = fingerPosition(image)
if len(lmList) != 0:
fingers = []
for id in range(1, 5):
if lmList[tipIds[id]][2] < lmList[tipIds[id] - 2][2]:
fingers.append(1)
else:
fingers.append(0)
totalFingers = fingers.count(1)
#if not imusic_started:
#subprocess.Popen(["sudo","/usr/bin/python3", "/home/pi/Project/imusic.py"])
#imusic_started = True
#if totalFingers == 4:
# state = "Play"
#if totalFingers == 0 and state == "Play":
# state = "Pause"
# pyautogui.press('space')
#if totalFingers == 1:
# if lmList[8][1] < 300:
# pyautogui.press('left')
# if lmList[8][1] > 400:
# pyautogui.press('right')
#if totalFingers == 2:
# if lmList[9][2] < 210:
# pyautogui.press('up')
# if lmList[9][2] > 230:
# pyautogui.press('down')
#if totalFingers == 0:
#os.system("touch pause.txt")
#state = "Pause"
#pyautogui.press('space')
if totalFingers == 1:
if lmList[8][1] < 300:
os.system("touch left.txt")
#pyautogui.press('left')
if lmList[8][1] > 400:
os.system("touch right.txt")
#pyautogui.press('right')
else:
os.system("touch 1.txt")
if totalFingers == 4:
if lmList[9][2] < 210:
os.system("touch up.txt")
#pyautogui.press('up')
if lmList[9][2] > 230:
os.system("touch down.txt")
#pyautogui.press('down')
if totalFingers == 2:
os.system("touch 2.txt")
if totalFingers == 3:
os.system("touch 3.txt")
if totalFingers == 0:
#state = "Play"
os.system("touch pause.txt")
#if totalFingers == 5:
#os.system("touch play.txt")
cv2.imshow("Media Controller", image)
if cv2.waitKey(10) & 0xFF == ord('q'):
break
cv2.destroyAllWindows()
cap.release()